Resource Scheduling System and Method under Graphics Processing Unit Virtualization Based on Instant Feedback of Application Effect

ABSTRACT

Physical Graphics Processing Unit (GPU) resource scheduling system and method based on an instant effect feedback of a guest application and between virtual machines are provided. An agent is inserted between a host physical GPU HostOps dispatch and a host physical GPU guest application interface through a hooking method, for delaying sending instructions and data in the host physical GPU HostOps dispatch, monitoring a relevant display performance condition of a GPU guest application in the virtual machine and a use condition of physical GPU resources, and providing a feedback to any GPU resource scheduling algorithm based on time or a time sequence. With the agent, it is unneeded for the method to make any modification to a virtual machine guest application, a host operating system, a virtual machine operating system, a GPU drive and a virtual machine manager. The present invention does not need to stop a machine operation.

CROSS REFERENCE OF RELATED APPLICATION

This is a U.S. National Stage under 35 U.S.0 371 of the International Application PCT/CN2013/077457, filed Jun. 19, 2013, which claims priority under 35 U.S.C. 119(a-d) to CN 201210261862.0, filed Jul. 26, 2012.

BACKGROUND OF THE PRESENT INVENTION Field of Invention

The present invention relates to a system and a method applied to the technical field of computer application, especially a system and a method for physical Graphics Processing Unit (GPU) resource scheduling between virtual machines based on an instant effect feedback of a guest application. More specifically, the present invention relates to a resource scheduling system under GPU virtualization based on the instant feedback of the application effect.

Description of Related Arts

Graphics Processing Unit (GPU) virtualization technology is being widely used in data centers performing GPU computing. The GPU computing includes but is not limited to cloud gaming, video rendering and general purpose GPU computing.

However, now a kind of effective system and method for scheduling the physical GPU resources between the virtual machines is absent, which is able to make all guest applications running in parallel in multiple virtual machines gain a relatively good GPU acceleration effect while having a high resource utilization rate. The existing GPU Video Graphics Array (VGA) passthrough method endows each piece of usable physical GPU into each running virtual machine. However, the method has the following disadvantages. Firstly, a general commercial mainboard only supports two to three GPUs, therefore a specially-made mainboard is needed to simultaneously run multiple virtual machines needing GPU supporting. Secondly, in each virtual machine, during running, the owned physical GPU resources generally cannot be exhausted. For the GPU VGA passthrough method, the remaining GPU resources cannot be endowed to other virtual machines, resulting in physical GPU resources wasting.

Another method is to utilize GPU paravirtualization technology to make multiple virtual machines share one piece or multiple pieces of physical GPUs. In 2009, a thesis, GPU Virtualization on VMware's Hosted I/O Architecture, published in SIGOPS Operating Systems Review, Volume 43 Issue 3, came up with the method and system. Thereafter, in Graphic Engine Resource Management, Bautin M. et al. came up with a scheduling strategy to have physical GPU resources distributed evenly among multiple applications in Multimedia Computing and Networking in 2008. Then in Usenix ATC in 2011, Kato, et al. came up with the idea of upgrading the ability of the physical GPU to accelerate key user programs by introducing the GPU resource use priority and modifying the GPU driving method of the operating system in the thesis, Timegraph: GPU Scheduling for Real-time Multi-tasking Environments.

The above two methods can maximize the use of physical GPU resources and at the same time provide the GPU acceleration ability for multiple virtual machines. However, the methods have following disadvantages. On one hand, it is necessary to modify the operating system or the GPU drive, and when being applied to virtual machines, it is even necessary to modify the virtual machine hypervisor or the guest applications in the virtual machines. Therefore, a problem of a high development difficulty exists; on the other hand, as the available methods cannot obtain the accelerated guest application operation effect feedback data, the available system and method for scheduling physical GPU resources have blindness, and the obtained resource scheduling effect is common.

SUMMARY OF THE PRESENT INVENTION

In view of the above disadvantages in existing technology, the present invention provides a physical Graphic Processing Unit (GPU) resource scheduling system and method based on an instant guest application effect feedback and between virtual machines. According to the traditional GPU virtualization technology, GPU commands and data in a virtual machine are sent to a host physical GPU guest application interface, Host GPU API, through a host physical GPU HostOps Dispatch. On basis of the traditional GPU virtualization technology, the method provided by the present invention adopts an agent inserted between the GPU HostOps Dispatch and the Host GPU API through a hooking method, for delaying sending instructions and data in the GPU HostOps Dispatch, and at the same time monitoring a relevant display performance condition of the guest application and a use condition of physical GPU resources, and then providing a feedback to any GPU resource scheduling algorithm based on time or a time sequence. The GPU resource scheduling algorithm based on time or the time sequence means that starting, ending and continuation of a use of the GPU resources are partially or wholly based on an absolute or relative time.

Besides, the system provided by the present invention accepts instantly a determination of users to start or stop using the agent through a scheduling controller, changes options and parameters of the used scheduling method and changes instantly corresponding parameter setups of the agent. At the same time, the scheduling controller displays or records one or more items of a scheduling and using condition of current physical GPU resources, a use condition of guest application GPU resources in all virtual machines, etc.

The present invention adopts an advanced prediction technology; and with a cooperation of delaying sending the instructions and the data in the host physical GPU HostOps dispatch, a frame latency is accurately controlled. The advanced predicting technology comprises a frame rendering performance prediction, and a flush single queued frame, wherein: the flush single queued frame comprises a mark flush frame and a commit flush frame; the mark flush frame is optional, for marking one frame of the virtual machine in the queue (including but not limited to a previous frame or previous frames), wherein the marked frame is showed as a frame required to be removed from a buffer (including but not limited to a force display); and the commit flush frame forces one frame (if the mark flush frame is executed, the frame corresponds to the marked frame) to be removed from the buffer of the physical GPU, so that the buffer of the physical GPU has enough space.

The system and method, provided by the present invention, have no need to change a host operating system, a host GPU drive, a hypervisor, a virtual machine operating system, a virtual machine GPU drive or the guest applications in virtual machine. Besides, the system and method provided by the present invention brings a performance cost less than 5% in operation with no significant virtual machine pause time (only millisecond class pause time necessary) resulting from starting or stopping use.

The present invention is realized through following technical schemes.

The present invention provides a resource scheduling system under GPU virtualization based on an instant feedback of an application effect, comprising a host physical GPU HostOps dispatch, a host physical GPU guest application interface, an agent and a scheduling controller; wherein:

-   -   the agent is connected between the host physical GPU HostOps         dispatch and the host physical GPU guest application interface;     -   the scheduling controller is connected with the agent; and     -   the scheduling controller receives user commands and delivers         the user commands to the agent; the agent receives the user         commands coming from the scheduling controller, monitors an         operating condition of a guest application, and transmits GPU         condition results of the guest application to the scheduling         controller, and at the same time calculates periodically or         calculates on an event basis a delay time necessary to meet a         lowest guest application GPU condition according to a scheduling         algorithm designated by the scheduling controller, and delays         sending instructions and data in the host physical GPU HostOps         dispatch to the host physical GPU guest application interface;         and, the scheduling controller receives, processes and displays         scheduling results and conditions coming from the agent.

Preferably, the scheduling controller receives the user commands, analyzes operations to the agent, a configuration of the scheduling algorithm and corresponding parameters in the user commands, delivers the use commands to the agent, receives the GPU condition results coming from the agent and displays to users.

Preferably, the scheduling controller comprises:

-   -   a control console, for receiving the user commands, wherein: the         user commands input the configuration of the scheduling         algorithm and the corresponding parameters, and acquire the         scheduling results from a scheduling communicator and display to         the users; and     -   the scheduling communicator, responsible for an communication         between the scheduling controller and one or more agents,         loading/unloading the agent, delivering the user commands to the         agent, and receiving the GPU condition results of the guest         application coming from the agent.

Preferably, the agent comprises:

-   -   a scheduler, for receiving designations in the user commands         about the configuration of the scheduling algorithm and the         corresponding parameters, finding a position of the         corresponding scheduling algorithm, configuring the scheduling         algorithm and operating the scheduling algorithm, and delaying         sending the instructions and the data in the host physical GPU         HostOps dispatch to the host physical GPU guest application         interface as required; and     -   a guest application GPU condition hypervisor, for collecting a         GPU condition coming from the host physical GPU guest         application interface, generating the GPU condition results of         the guest application through the GPU condition, and at the same         time feeding back the GPU condition results of the guest         application to the scheduler and delivering to the scheduling         communicator in the scheduling controller.

Preferably, the GPU condition of the guest application comprises: measurements of a GPU physical condition and/or a GPU logic condition relevant to a guest application variety. The measurements of the GPU physical condition comprise a GPU load, a temperature, and a voltage. For computer three-dimensional games, the measurements of the GPU logic condition comprise frames per second (FPS); and for computer general purpose GPU operations, the measurements of the GPU logic condition comprise operations per second (Ops), a GPU load (application GPU usage) of the guest application.

According to the resource scheduling system described above, the present invention further provides a GPU resource scheduling method under the GPU virtualization. The agent is inserted between the host physical GPU HostOps dispatch and the host physical GPU guest application interface through a hooking method, for delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface and at the same time monitoring a relevant display performance condition of the guest application and a use condition of physical GPU resources, and then providing a feedback to any GPU resource scheduling algorithm based on time or a time sequence. With the agent, it is unneeded for the GPU resource scheduling method to make any modification to a virtual machine guest application, a host operating system, a virtual machine operating system, a GPU drive, and a virtual machine manager, and a performance loss is low.

The GPU resource scheduling method under the GPU virtualization comprises steps of: after finishing starting one or more virtual machines, when a customer needs to install the resource scheduling system, through means operated by the guest application, finding a process by the scheduling controller or designating a process by the user, and binding the agent to the corresponding virtual machine according to the process; then establishing an communication between the scheduling communicator in the scheduling controller and the bound agent; when scheduling GPU resources, issuing an instruction, selecting the scheduling algorithm (can be a scheduling algorithm developed by a third party) and providing the corresponding parameters by the customer; receiving the instruction from the customer by the control console, and then sending the user commands to the agent by the scheduling communicator; according to the user commands, by the agent, configuring and operating the selected GPU resource scheduling algorithm, and delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface; at the same time, by the guest application GPU condition hypervisor, collecting the GPU condition coming from the host physical GPU guest application interface, generating a GPU condition of the guest application, then feeding back periodically or on the event basis the GPU condition results of the guest application to the scheduler, and delivering to the scheduling communicator in the scheduling controller; when the customer needs to unload the resource scheduling system, issuing an unloading instruction through the scheduling controller by the customer; receiving the unloading instruction by the control console, sending the user commands to the agent by the scheduling communicator, receiving the unloading instruction and stopping an operation of the agent by the agent.

Preferably, the GPU resource scheduling method adopts a GPU resource usage advanced prediction method. With a cooperation of delaying sending the instructions and the data in the host physical GPU HostOps dispatch, an accurate control of a frame latency is realized. The GPU resource usage advanced prediction method comprises a frame rendering performance prediction, and a flush single queued frame, wherein:

-   -   the frame rendering performance prediction comprises steps of:         according to a historic record of a consumption time of the         physical GPU resources corresponding to the host physical GPU         guest application interface, predicting a current consumption         time of the physical GPU resources; and     -   the flush single queued frame comprises a mark flush frame and a         commit flush frame; wherein the mark flush flame is optional,         and comprise a step of: marking a frame (including but not         limited to a previous frame or previous frames) of the virtual         machine in the queue, wherein the marked frame is showed as a         frame required to be removed from a buffer (including but not         limited to a forcing display of the frame); and the commit flush         frame comprises a step of: forcing a frame to be removed from         the buffer of a physical GPU, wherein the removed frame is the         marked frame if the mark flush frame is executed, in such a         manner that the buffer of the physical GPU has enough space.

Preferably, the step of “binding the agent to the corresponding virtual machine” comprises steps of:

Step 1.1, according to information designated by the user, finding image rendering processes of the designated virtual machine, wherein depending on different virtual machine manager designs, the image rendering processes are feasible to be a virtual machine process, or selecting all of the image rendering processes of the relevant virtual machine, and executing each image rendering process of the virtual machine with following Step 1.2 to Step 1.6;

Step 1.2, creating a new thread in the process and loading one agent in the process;

Step 1.3, visiting an entrance of the agent, and initializing the agent;

Step 1.4, finding an address set of the host physical GPU guest application interface loaded by the process, modifying a code at each address of the host physical GPU guest application interface, pointing the code at an entrance of a corresponding handler in the agent and saving contents of all registers by the code, so that the process will run the handler each time when using the host physical GPU guest application interface in future;

Step 1.5, setting a return address of the handler as an old host physical GPU guest application interface address, running the instruction, and resuming the contents of all the registers, so that the handler is able to correctly execute an original host physical GPU guest application interface after ending an operation of the handler; and

Step 1.6, not ending the thread.

Preferably, if using forecasting techniques, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of:

Step 2.1a, in the handler designated by the GPU resource scheduling algorithm, according to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface, predicting the current consumption time of the physical GPU resources, and stopping counting a current consumption time of a Central Processing Unit (CPU);

Step 2.2a, stopping an execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the predicted current consumption time of the physical GPU resources;

Step 2.3a, starting counting the current consumption time of the physical GPU resources;

Step 2.4a, calling the original host physical GPU guest application interface; and

Step 2.5a, stop counting the current consumption time of the physical GPU resources, uploading the current consumption time of the physical GPU resources to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface.

Preferably, if no forecasting technique is used, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of:

Step 2.1b, in the handler designated by the GPU resource scheduling algorithm, stopping counting the current consumption time of the CPU and staring counting the current consumption time of the physical GPU resources;

Step 2.2b, calling the original host physical GPU guest application interface;

Step 2.3b, stopping counting the current consumption time of the physical GPU resources; and

Step 2.4b, stopping the execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the current consumption time of the physical GPU resources.

Preferably, if using the GPU resource usage advanced prediction method, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of:

Step 2.1c, in the handler designated by the GPU resource scheduling algorithm, proceeding the commit flush frame in the flush single queued frame; through the commit flush frame, forcing one frame to be removed, wherein if the mark flush frame is executed, the frame is the marked frame, in such a manner that the buffer of the physical GPU has the enough space; stopping counting the current consumption time of the CPU;

Step 2.2c, by the frame rendering performance prediction, according to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface, predicting the current consumption time of the physical GPU resources;

Step 2.3c, stopping the execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the predicted current consumption time of the physical GPU resources;

Step 2.4c, starting counting the current consumption time of the physical GPU resources;

Step 2.5c, calling the original host physical GPU guest application interface;

Step 2.6c, stopping counting the current consumption time of the physical GPU resources; and

Step 2.7c, starting counting a next consumption time of the CPU; executing the mark flush frame in the flush single queued frame, and marking one frame (including but not limited to a previous frame or previous frames) of the virtual machine in the queue by the mark flush frame, wherein the marked frame is showed as a frame required to be removed from the buffer (including but not limited to a forcing display of the frame); uploading the current consumption time of the physical GPU resources to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface.

Preferably, the step of collecting the GPU condition coming from the host physical GPU guest application interface by the guest application GPU condition hypervisor comprises steps of:

Step 3.1, in the handler designated by the GPU resource scheduling algorithm, calling the host physical GPU guest application interface, an operating system kernel or an interface provided by a GPU drive; and according to requirements of the GPU resource scheduling algorithm and the user commands, collecting the GPU condition, such as the GPU load, the temperature, the voltage, the FPS, the Ops, and the GUP load of the guest application; and

Step 3.2, in the handler designated by the GPU resource scheduling algorithm, calling the original host physical GPU guest application interface.

Preferably, the step of “generating a GPU condition of the guest application” comprises steps of:

Step 4.1, designating a condition reporting frequency by the user, and acquiring the condition reporting frequency in the agent;

Step 4.2, when a condition reporting point comes, by the guest application GPU condition hypervisor in the agent, sending an accumulative condition result to the scheduling communicator in the scheduling controller; and

Step 4.3, emptying a condition result buffer of the agent by the agent.

Preferably, the step of “receiving the unloading instruction and stopping an operation of the agent by the agent” comprises steps of:

Step 5.1, receiving the unloading instruction by the agent, executing following Step 5.2 to Step 5.3 by the agent;

Step 5.2, resuming the address set of the host physical GPU guest application interface loaded by the process, modifying the code at the address of each host physical GPU guest application interface to a content at an address of the original guest application interface, so that the process will run a logic of the original guest application interface each time when using the host physical GPU guest application interface in future; and

Step 5.3, ending the thread inserted into a process of binding the agent into the corresponding virtual machine, and unloading the agent.

Preferably, the GPU resource scheduling algorithm comprises following steps of:

Step 6.1, for virtual machine groups of VM1, VM2 . . . to VMn, analyzing a user method configuration by the scheduler in the agent of each virtual machine, and obtaining a minimum GPU load, minimum frames per second to be met (an application scope of the present invention is not limited to computer games, and for other GPU applications, measurements for different conditions are feasible), and a testing period T designated by the user;

Step 6.2, during operation, calling the handler for multiple times; and for each call of the handler executing Step 2.1a to Step 2.5a with the forecasting techniques; or executing Step 2.1b to Step 2.4b without the forecasting techniques;

Step 6.3, for each testing period T, if a virtual machine VMm does not satisfy a condition measurement, finding and reducing a setting of minimum frames per second of a virtual machine having maximum and minimum frames per second, wherein a reduced magnitude of the frames per second is determined by an application GPU load of the guest application for recent frames; and the frames per second and the application GPU load for the recent frames have a linear relation;

Step 6.4, for each testing period T, if a utilization rate of the physical GPU fails to meet the minimum GPU load, increasing a setting of the minimum frames per second for all of the virtual machines; wherein an increased magnitude of the frames per second is determined by the application GPU load of the guest application for the recent frames, and the frames per second and the application GPU load of the guest application for the recent frames have the linear relation; and

Step 6.5, keeping Step 6.2 to Step 6.4 valid until the method designated by the user ends or the method is changed or the agent is unloaded.

According to the present invention, one agent is installed in the host physical GPU HostOps Dispatch corresponding to each virtual machine, and the agent is owned independently by the host physical GPU HostOps Dispatch. The only scheduling controller available globally is connected with one or more agents. Compared with the prior art, the present invention has following advantages. Firstly, it is unneeded to make any modification to the guest application of the virtual machine, the host operating system, the operating system of the virtual machine, the GPU drive and the virtual machine manager. The existing systems usually need to modify one of the above items in a great deal to realize a similar scheduling ability. Such modification may cause the existing system to evolve continuously to be compatible with the latest guest application, the operating system, the GPU drive, etc. Secondly, according to the present invention, it is unnecessary to stop the machine operation temporarily in installation or unloading, which enables the system to be deployed in the commercial system easily, and makes the system particularly applicable to a commercial server that can be used for 7×24 hours. Finally, when operating in significantly upgrading the GPU resource scheduling ability between the virtual machines, the present invention has a high performance, with a general performance loss being less than 5%.

These and other objectives, features, and advantages of the present invention will become apparent from the following detailed description, the accompanying drawings, and the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a module schematic diagram of a resource scheduling system under Graphics Processing Unit (GPU) virtualization based on an instant feedback of an application effect according to a preferred embodiment of the present invention.

FIG. 2 is a framework schematic diagram of the resource scheduling system according to the preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A preferred embodiment of the present invention is illustrated in detail as follows. The preferred embodiment is implemented under the prerequisite of the technical scheme of the present invention and gives a detailed embodiment mode and the specific operation process, but the protection scope of the present invention is not limited to the following preferred embodiment.

According to a preferred embodiment of the present invention, as shown in FIG. 2, the present invention provides a resource scheduling system under Graphics Processing Unit (GPU) virtualization based on an instant feedback of an application effect, comprising an agent and a scheduling controller; wherein: the scheduling controller is connected with the agent, for sending user commands into the agent and receiving a GPU condition result returned from the agent; the agent is inserted between a host physical GPU HostOps dispatch and a host physical GPU guest application interface, for delaying corresponding calling and a data down transmission; and, the agent is at the same time responsible for utilizing the host physical GPU guest application interface to collect measurements of a physical condition and/or a logic condition of a GPU. The preferred embodiment is aimed at computer games operating in virtual machines, therefore the collected physical condition and logic condition comprise an application GPU load and frames per second (FPS).

As shown in FIG. 1, the scheduling controller comprises a control console, and a scheduling communicator, wherein: the control console is for receiving the user commands; the user commands input a configuration about a scheduling algorithm and corresponding parameters; the control console obtains, periodically/on an event basis, scheduling results from the scheduling communicator and displays to a user; the scheduling communicator is responsible for an communication between the scheduling controller and one or more agent, and responsible for such operations as installing/unloading the agent, and delivering the user commands to the agent; the event basis means one or more occurrences of a target event, but an occurrence time interval is not constant; and an event distribution in respect of time can be mathematically expressed as a time sequence of non-periodic nature.

As shown in FIG. 1, the agent comprises a scheduler and a guest application GPU condition hypervisor; wherein: the scheduler is for receiving designations in the user commands about the configuration of the scheduling algorithm and the corresponding parameters, and is responsible for running the corresponding scheduling algorithm according to the configuration, and delaying sending instructions and data in the GPU HostOps dispatch to the host physical GPU guest application interface, Host GPU API, according to requirements; the guest application GPU condition hypervisor is responsible for collecting a GPU condition coming from the Host GPU API, thereby generating a GPU condition of the guest application, then feeding back periodically/on the event basis the GPU condition result of the guest application to the scheduler, and delivering to the scheduling communicator in the scheduling controller.

The GPU condition of the guest application refers to the measurement of the physical condition and/or the logic condition of the GPU relevant to a guest application variety. According to the preferred embodiment, the collected physical condition and logic condition comprise the application GPU load and the FPS.

The preferred embodiment is aimed at a VMWare Player 4.0 virtual machine manager system, and therefore a virtual machine image rendering process, namely a virtual machine process, is designated. According to the preferred embodiment, only a circumstance that a user selects all relevant virtual machine image rendering processes is considered.

According to the preferred embodiment, an applied resource scheduling method under the GPU virtualization based on the instant feedback of the application effect is configured as: a minimum GPU load=80%, minimum FPS=30 and a testing period T, designated by user, T=1 s.

The present invention works through following steps of:

Step I, selecting all relevant virtual machine processes by the user, and executing each virtual machine process with Step II to Step VI;

Step II, creating a new thread in the process, and loading one agent in the process;

Step III, visiting an entrance of the agent, and initializing the agent;

Step IV, finding an address set of the host physical GPU guest application interface loaded by the process, modifying a code at each address of the host physical GPU guest application interface address, pointing the code at an entrance of a corresponding handler in the agent and saving contents of all registers by the code, so that the process will run the handler each time when using the host physical GPU guest application interface in future;

Step V, setting a return address of the handler as an old host physical GPU guest application interface address, running the instruction, and resuming the contents of all the registers, so that the handler is able to correctly execute an original host physical GPU guest application interface after an operation of the handler; and

Step VI, not ending the thread.

Through the above steps, the agent is bound to the corresponding virtual machine. After establishing the communication between the scheduling communicator in the scheduling controller and the bound agent, the agent is able to send the GPU condition result to the scheduling controller and respond to the user commands issued by the scheduling controller. When it is necessary to schedule GPU resources at some point thereafter, the present invention executes following steps of:

Step 1, for virtual machine groups of VM1, VM2 . . . to VMn, analyzing a user algorithm configuration by the scheduler in the agent of each virtual machine, obtaining the minimum GPU load=80%, the minimum FPS=30 to be met, and the testing period T, designated by the user, T=1 s;

Step 2, during operation, calling the handler for multiple times for collecting the GPU condition and delaying sending the instructions and the data in the GPU HostOps dispatch to the Host GPU API; and for each call of the handler, executing Step 2.1 to Step 2.6;

Step 2.1, in the handler designated by a GPU resource scheduling algorithm, predicting a current consumption time of the GPU according to a historic record of a consumption time of the GPU corresponding to the host physical GPU guest application interface;

Step 2.2, utilizing the Host GPU API and a GPU drive interface, measuring a current application GPU load and current FPS within a current t time; and stopping counting the current consumption time of the CPU;

Step 2.3, stopping the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and a current consumption time of the GPU;

Step 2.4, starting counting the current consumption time of the GPU;

Step 2.5, calling the original host physical GPU guest application interface; and

Step 2.6, stopping counting the current consumption time of the GPU, and updating to the historic record of the consumption time of the GPU corresponding to the host physical GPU guest application interface;

Step 3, for each testing period T, if a virtual machine VMm does not satisfy the minimum FPS, finding and reducing a setting of minimum frames per second of a virtual machine having the maximum and minimum FPS; wherein a reduced magnitude of the FPS depends on an application GPU load for recent frames, and the FPS and the application GPU load for the recent frames have a linear relation;

Step 4, for each testing period T, if a utilization rate of the physical GPU fails to meet the minimum GPU load, increasing a setting of the minimum FPS for all of the virtual machines; wherein an increased magnitude of the FPS depends on the application GPU load for the recent frames, and the FPS and the application GPU load for the recent frames have the linear relation; and

Step 5, keeping Step 2 to Step 4 valid until the algorithm designated by the user ends or the algorithm is changed or the agent is unloaded.

According to the preferred embodiment of the present invention, the agent is unloaded through following steps of:

Step a, receiving an unloading instruction by the agent, and starting an unloading process by the agent from Step b to Step c;

Step b, resuming the address set of the host physical GPU guest application interface loaded by the process, modifying the code at the address of each host physical GPU guest application interface to a content at an address of the original guest application interface address, so that the process will run a logic of the original guest application interface each time when using the host physical GPU guest application interface in future; and

Step c, ending the thread inserted in a process of binding the agent to the corresponding virtual machine, and unloading the agent.

One skilled in the art will understand that the embodiment of the present invention as shown in the drawings and described above is exemplary only and not intended to be limiting.

It will thus be seen that the objects of the present invention have been fully and effectively accomplished. Its embodiments have been shown and described for the purposes of illustrating the functional and structural principles of the present invention and is subject to change without departure from such principles. Therefore, this invention includes all modifications encompassed within the spirit and scope of the following claims. 

1. A resource scheduling system under GPU virtualization based on an instant feedback of an application effect, comprising a host physical Graphics Processing Unit (GPU) HostOps dispatch, a host physical GPU guest application interface, an agent and a scheduling controller; wherein: the agent is connected between the host physical GPU HostOps dispatch and the host physical GPU guest application interface; the scheduling controller is connected with the agent, the scheduling controller receives user commands, and delivers the user commands to the agent; the agent receives the user commands coming from the scheduling controller, monitors an operating condition of a guest application and transmits a GPU condition result of the guest application to the scheduling controller, and at the same time, calculates periodically/on an event basis a delay time necessary to meet a lowest guest application GPU condition according to a scheduling algorithm designated by the scheduling controller, and delays sending instructions and data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface; and, the scheduling controller receives, processes, and displays a scheduling result and condition coming from the agent.
 2. The resource scheduling system under the GPU virtualization based on the instant feedback of the application effect, as recited in claim 1, wherein: the scheduling controller is for receiving the user commands, analyzing an operation to the agent, a configuration of the scheduling algorithm and corresponding parameters in the user commands, delivering the user commands to the agent, receiving the GPU condition result coming from the agent and displaying to a user.
 3. The resource scheduling system under the GPU virtualization based on the instant feedback of the application effect, as recited in claim 2, wherein the scheduling controller comprises: a control console, for receiving the user commands, wherein the user commands input the configuration of the scheduling algorithm and the corresponding parameters, and acquire the scheduling results from a scheduling communicator and display to users; and the scheduling communicator, responsible for an communication between the scheduling controller and one or more agents, loading/unloading the agent, delivering the user commands to the agent, and receiving the GPU condition results of the guest application coming from the agent.
 4. The resource scheduling system under the GPU virtualization based on the instant feedback of the application effect, as recited in claim 3, wherein the agent comprises: a scheduler, for receiving designations in the user commands about the configuration of the scheduling algorithm and the corresponding parameters, finding a position of the corresponding scheduling algorithm, configuring the scheduling algorithm and operating the corresponding scheduling algorithm, and delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface as required; and a guest application GPU condition hypervisor, for collecting a condition coming from the host physical GPU guest application interface, generating the GPU condition result of the guest application through the GPU condition, and at the same time feeding back the GPU condition result of the guest application to the scheduler and delivering to the scheduling communicator in the scheduling controller.
 5. The resource scheduling system under the GPU virtualization based on the instant feedback of the application effect, as recited in claim 4, wherein the GPU condition of the guest application comprises measurements of a physical condition and/or a logic condition of a GPU relevant to a guest application variety.
 6. A GPU resource scheduling method under GPU virtualization with the resource scheduling system recited in claim 5, wherein: the agent is inserted between the host physical GPU HostOps dispatch and the host physical GPU guest application interface through a hooking method, for delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface and at the same time monitoring a relevant display performance condition of the guest application and a use condition of physical GPU resources, and then providing a feedback to any GPU resource scheduling algorithm based on time or a time sequence; with the agent, it is unneeded for the method to make any modification to a virtual machine guest application, a host operating system, a virtual machine operating system, a GPU drive, and a virtual machine manager, and with a performance loss is low; the GPU resource scheduling method comprising steps of after finishing starting one or more virtual machines, when a customer needs to install the resource scheduling system, through means operated by the guest application, finding a process by the scheduling controller or designating a process by the user, and binding the agent to the corresponding virtual machine according to the process; then establishing an communication between the scheduling communicator in the scheduling controller and the agent when scheduling the GPU resources, issuing an instruction, selecting the scheduling algorithm and providing the corresponding parameters by the customer; after receiving the instruction from the customer by the control console, and then sending the user commands to the agent by the scheduling communication; according to the user commands, by the agent, configuring and operating the selected GPU resource scheduling algorithm, and delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface; at the same time, by the guest application GPU condition hypervisor, collecting the GPU condition coming from the host physical GPU guest application interface, generating the GPU condition of the guest application, then feeding back periodically or on the event basis the GPU condition result of the guest application to the scheduler, and delivering to the scheduling communicator in the scheduling controller; when the customer needs to unload the resource scheduling system, issuing an unloading instruction through the scheduling controller by the customer; receiving the unloading instruction by the control console, sending the user command to the agent by the scheduling communicator, and receiving the unloading instruction and stopping an operation of the agent by the agent.
 7. (canceled)
 8. The method as recited in claim 6, wherein the GPU resource scheduling method adopts a GPU resource usage advanced prediction method; with a cooperation of delaying sending the instructions and the data in the host physical GPU HostOps dispatch, an accurate control of a frame latency is controlled; the GPU resource usage advanced prediction method comprises a frame rendering performance prediction, and a flush single queued frame, wherein: the frame rendering performance prediction comprises steps of: according to a historic record of a consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface, predicting a current consumption time of the physical GPU resources; and the flush single queued frame comprises a mark flush frame and a commit flush frame; wherein the mark flush frame is optional, and comprise a step of: marking a frame of the virtual machine in the queue, wherein the marked frame is showed as a frame required to be removed from a buffer; and the commit flush frame comprises a step of: forcing one frame to be removed from the buffer of a physical GPU, in such a manner that the buffer of the physical GPU has enough space. 9-15. (canceled)
 16. The method, as recited in claim 6, wherein: the step of “binding the agent to the corresponding virtual machine” comprises steps of: Step 1.1, according to information designated by a user, finding image rendering processes of the designated virtual machine, or selecting all of the image rendering processes of the relevant virtual machine, and executing each image rendering process of the virtual machine with following Step 1.2 to Step 1.6; Step 1.2, creating a new thread in the process and loading the agent in the process; Step 1.3, visiting an entrance of the agent, and initializing the agent; Step 1.4, finding an address set of the host physical GPU guest application interface loaded by the process, modifying a code at each address of the host physical GPU guest application interface, pointing the code at an entrance of a corresponding handler in the agent and saving contents of all registers by the code, so that the process will run the handler each time when using the host physical GPU guest application interface in future; Step 1.5, setting a return address of the handler as an old host physical GPU guest application interface address, running the instruction, and resuming the contents of all the registers, so that the handler is able to correctly correct an original host physical GPU guest application interface after ending an operation of the handler; and Step 1.6, not ending the thread.
 17. The method, as recited in claim 8, wherein: the step of “binding the agent to the corresponding virtual machine” comprises steps of: Step 1.1, according to information designated by a user, finding image rendering processes of the designated virtual machine, or selecting all of the image rendering processes of the relevant virtual machine, and executing each image rendering process of the virtual machine with following Step 1.2 to Step 1.6; Step 1.2, creating a new thread in the process and loading the agent in the process; Step 1.3, visiting an entrance of the agent, and initializing the agent; Step 1.4, finding an address set of the host physical GPU guest application interface loaded by the process, modifying a code at each address of the host physical GPU guest application interface, pointing the code at an entrance of a corresponding handler in the agent and saving contents of all registers by the code, so that the process will run the handler each time when using the host physical GPU guest application interface in future; Step 1.5, setting a return address of the handler as an old host physical GPU guest application interface address, running the instruction, and resuming the contents of all the registers, so that the handler is able to correctly correct an original host physical GPU guest application interface after ending an operation of the handler; and Step 1.6, not ending the thread.
 18. The method, as recited in claim 8, wherein: If using forecasting techniques, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of: Step 2.1a, in the handler designated by the GPU resource scheduling algorithm, according to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface, predicting the current consumption time of the physical GPU resources, and stopping counting a current consumption time of a Central Processing Unit (CPU); Step 2.2a, stopping an execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the predicted current consumption time of the physical GPU resources; Step 2.3a, starting counting the current consumption time of the physical GPU resources; Step 2.4a, calling the original host physical GPU guest application interface; and Step 2.5a, stop counting the current consumption time of the physical GPU resources, uploading the current consumption time of the physical GPU resources to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface; If no forecasting techniques is used, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of: Step 2.1b, in the handler designated by the GPU resource scheduling algorithm, stopping counting the current consumption time of the CPU and staring counting the current consumption time of the physical GPU resources; Step 2.2b, calling the original host physical GPU guest application interface; Step 2.3b, stopping counting the current consumption time of the physical GPU resources this time; and Step 2.4b, stopping the execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the current consumption time of the physical GPU resources; if using the GPU resource usage advanced prediction method, the step of “delaying sending the instructions and the data in the host physical GPU HostOps dispatch to the host physical GPU guest application interface” comprises steps of: Step 2.1c, in the handler designated by the GPU resource scheduling algorithm, proceeding the commit flush frame in the flush single queued frame; through the commit flush frame, forcing one frame to be removed, wherein if the mark flush frame is executed, the frame is the marked frame, in such a manner that the buffer of the physical GPU has the enough space; stopping counting the current consumption time of the CPU; Step 2.2c, by the frame rendering performance prediction, according to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface, predicting the current consumption time of the physical GPU resources; Step 2.3c, stopping the execution of the CPU for a period of time, wherein a length of the period of time is calculated by the scheduling algorithm according to the current consumption time of the CPU and the predicted current consumption time of the physical GPU resources; Step 2.4c, starting counting the current consumption time of the physical GPU resources; Step 2.5c, calling the original host physical GPU guest application interface; Step 2.6c, stopping counting the current consumption time of the physical GPU resources; and Step 2.7c, starting counting a next consumption time of the CPU; executing the mark flush frame in the flush single queued frame, and marking one frame of the virtual machine in the queue by the mark flush frame, wherein the marked frame is showed as a frame required to be removed from the buffer; uploading the current consumption time of the physical GPU resources to the historic record of the consumption time of the physical GPU resources corresponding to the host physical GPU guest application interface.
 19. The method as recited in claim 6, wherein the step of collecting the GPU condition coming from the host physical GPU guest application interface by the guest application GPU condition hypervisor comprises steps of: Step 3.1, in the handler designated by the GPU resource scheduling algorithm, calling the host physical GPU guest application interface, an operating system kernel or an interface provided by a GPU drive; and according to requirements of the GPU resource scheduling algorithm and the user commands, collecting the GPU condition; and Step 3.2, in the handle designated by the GPU resource scheduling algorithm, calling the original host physical GPU guest application interface.
 20. The method as recited in claim 6, wherein the step of “generating a GPU condition of the guest application” comprises steps of: Step 4.1, designating a condition reporting frequency by the user, and acquiring the condition reporting frequency in the agent; Step 4.2, when a condition reporting point comes, by the guest application GPU condition hypervisor in the agent, sending an accumulative condition result to the scheduling communicator in the scheduling controller; and Step 4.3, emptying a condition result buffer of the agent by the agent.
 21. The method, as recited in claim 6, wherein the step of “receiving the unloading instruction and stopping an operation of the agent by the agent” comprises steps of: Step 5.1, after receiving the unloading instruction by each agent, executing following Step 5.2 to Step 5.3 by each agent; Step 5.2, resuming the address set of the host physical GPU guest application interface loaded by the process, modifying the code at the address of each host physical GPU guest application interface to a content at an address of original guest application interface, so that the process will run a logic of the original guest application interface each time when using the host physical GPU guest application interface in future; and Step 5.3, ending the thread inserted into a process of binding the agent into the corresponding virtual machine, and unloading the agent.
 22. The method, as recited in claim 6, wherein the GPU resource scheduling algorithm comprises following steps of: Step 6.1, for virtual machine groups of VM1, VM2 . . . to VMn, analyzing a user method configuration by the scheduler in the agent of each virtual machine, and obtaining a minimum GPU load, minimum frames per second to be met, and a testing period T designated by the user; Step 6.2, during operation, calling the handler for multiple times; and for each call of the handler, executing Step 2.1a to Step 2.5a with the forecasting techniques; or executing Step 2.1b to Step 2.4b without the forecasting techniques; Step 6.3, for each time period T, if a virtual machine VMm does not satisfy a condition measurement, finding and reducing a setting of minimum frames per second of a virtual machine having the maximum and minimum frames per second; wherein: a reduced magnitude of the frames per second depends on an application GPU load of the guest application for recent frames, and the frames per second and the application GPU load for the recent frames have a linear relation; Step 6.4, for each time period T, if a utilization rate of the physical GPU fails to meet the minimum GPU load, increasing a setting of the minimum frames per second for all of the virtual machines; wherein an increased magnitude of the frames per second depends on the application GPU load of the guest application for the recent frames, and the frames per second and the application GPU load of the guest application for the recent frames have the linear relation; and Step 6.5, keeping Step 6.2 to Step 6.4 valid until the method designated by the user ends or the method is changed or the agent is unloaded.
 23. The method, as recited in claim 8, wherein: if the mark flush frame is executed, the frame removed from the buffer is the marked frame. 